Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable coredumps on travis-ci #703

Merged
merged 2 commits into from Jun 21, 2016
Merged

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Jun 21, 2016

Experimental PR to support coredump backtraces in travis-ci.

The travis-ci containers have /proc/sys/kernel/core_pattern as |/usr/share/apport/apport %p %s %c %P, possibly inherited from the host kernel (See travis-ci/travis-ci#3754). When apport is not installed, all coredumps are lost. Until travis implements a method to reset core_pattern, installation of apport package seems to direct corefiles to the cwd of the dumping process so we can at least examine them.

apport also seems to respect current corefile limit, so if ulimit -c unlimited is also set for travis run, we do seem to be able to get corefiles inside a travis.

To analyze any coredump that might be generated during travis run, a backtrace-all.sh script is added and run from the after_failure hook.

Proof of concept can be seen from an experiment in another project here

Add a script to find and print backtrace for all corefiles
found under current directory.
@grondo grondo added the review label Jun 21, 2016
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.08%) to 75.169% when pulling cafc642 on grondo:travis-coredumps into 9a65636 on flux-framework:master.

In order to support coredumps, install apport and gdb into the travis-ci
image. Installation of apport will allow coredumps to be saved, since
default travis container core_pattern is piped to apport, and the core dumps
are lost when apport is not installed (and there is no current way in
travis to change the default core_pattern).

Apport will obey current corefile limit, so the corefile limit must also
increased to unlimited.

Finally, in after_failure script the backtrace-all.sh script is always run
to find and print backtrace for any corefiles left
@coveralls
Copy link

coveralls commented Jun 21, 2016

Coverage Status

Coverage decreased (-0.004%) to 75.247% when pulling 66e5502 on grondo:travis-coredumps into 9a65636 on flux-framework:master.

@grondo
Copy link
Contributor Author

grondo commented Jun 21, 2016

I also removed use of libSegFault.so since the backtrace we were seeing with libSegFault wasn't so helpful, and I was worried it might unnecessarily perturb the backtrace or prevent a coredump. However I tested it in my other repo and it seems to work fine, so I could add that back if we'd like.

@garlick
Copy link
Member

garlick commented Jun 21, 2016

This should be handy!

Am I right in assuming that a Libsegfault backtrace isn't likely to add any
info beyond what gdb gives us? If so then I'm for leaving it out.
On Jun 21, 2016 6:44 AM, "Mark Grondona" notifications@github.com wrote:

I also removed use of libSegFault.so since the backtrace we were seeing
with libSegFault wasn't so helpful, and I was worried it might
unnecessarily perturb the backtrace or prevent a coredump. However I tested
it in my other repo and it seems to work fine, so I could add that back if
we'd like.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#703 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAKX2xu8uQ9m3O4TD-57yy-rEebSmpBtks5qN-rYgaJpZM4I6WsN
.

@grondo
Copy link
Contributor Author

grondo commented Jun 21, 2016

Am I right in assuming that a Libsegfault backtrace isn't likely to add any
info beyond what gdb gives us? If so then I'm for leaving it out.

The only thought I had after the fact was that the libSegFault backtrace is issued at the time of failure, so it might help in determine which test within an individual sharness script hit the segfault.

However, that might be obvious or unnecessary with the gdb backtrace

@garlick
Copy link
Member

garlick commented Jun 21, 2016

Merging per our discussion.

@garlick garlick merged commit bb83fa6 into flux-framework:master Jun 21, 2016
@garlick garlick removed the review label Jun 21, 2016
@grondo grondo deleted the travis-coredumps branch June 25, 2016 01:04
@springmeyer
Copy link

@grondo - did you confirm this is working? I'm asking because I'm the original user that reported travis-ci/travis-ci#3754. I noticed your comment about installing apport and while your travis-ci/apt-package-safelist#2642 was rejected it looks like it is installing okay still. I do however see:

Setting up apport (2.0.1-0ubuntu17.13) ...
start: Job failed to start
invoke-rc.d: initscript apport, action "start" failed.

In your logs: https://travis-ci.org/flux-framework/flux-core/jobs/144604928#L203

Not sure if that is a problem or not. I'll try your method when I have time, just curious now if you are sure it is working: did you see core files generated for sudo:false travis machines?

springmeyer pushed a commit to springmeyer/travis-coredump that referenced this pull request Jul 14, 2016
@grondo
Copy link
Contributor Author

grondo commented Jul 14, 2016

@springmeyer, It did work at some point, as the proof of concept in another project was able to generate coredumps and backtrace: https://travis-ci.org/grondo/io-watchdog/jobs/139202348

apport always seems to write coredumps as core in this case, so if you have multiple processes in the same directory that might generate a corefile, they may overwrite eachother.

I have not checked lately though, so it is possible this has broken in travis-ci since that last test. I hope not though.

@grondo
Copy link
Contributor Author

grondo commented Jul 14, 2016

@springmeyer, actually apport was whitelisted here in apt-package-whitelist commit 77814f240652a97a3e1e6b5398ca8ea168e137e2.

I did get the same error during apport installation in the successful test, so I'm hopeful it will work for you. (https://travis-ci.org/grondo/io-watchdog/jobs/139202348#L166)

springmeyer pushed a commit to springmeyer/travis-coredump that referenced this pull request Jul 14, 2016
# The first commit's message is:
# This is a combination of 7 commits.
# The first commit's message is:
try sudo:false

# This is the 2nd commit message:

debug ulimit set not working with sudo:false

# This is the 3rd commit message:

further debug core file existance

# This is the 4th commit message:

look for corefile at /usr/share/apport/apport

# This is the 5th commit message:

back one directory

# This is the 6th commit message:

look in /var/crash

# This is the 7th commit message:

try catchsegv

# This is the 2nd commit message:

fix syntax

# This is the 3rd commit message:

Poke

# This is the 4th commit message:

try installing apport like flux-framework/flux-core#703

# This is the 5th commit message:

remove catchsegv

# This is the 6th commit message:

remove apport
@springmeyer
Copy link

Thanks @grondo - looks like things work and that log warning is not fatal.

@garlick garlick mentioned this pull request Aug 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants